tele9752wikiaorg-20200213-history
Failover
'Introduction' Failover is a computing redundant operation which is used for switching to backup, standby devices or networks automatically when a running device or network terminated accidently. 'Background' In traditional network systems, when some single devices goes down, the whole system may not work either. It is difficult for engineers to find and repair the devices in a very short time. Even couple of seconds may also bring a significant loss in some backbone systems. In order to make the system having high fault tolerance and reliability, failover becomes a reasonable solution in mission-critical systems. It provides several backup, standby system components to take over the responsibility automatically to ensure the system works continually and smoothly. It also provides a trouble-free system for end users and service providers. 'Applications in Different Devices' Failover can be applied to many different aspects of system and different layers of devices to ensure stable and reliable systems. Generally, based on the device types, failover will be discussed in the following four aspects, which are servers, routers, switches and port-channel. 'Servers Failover' Failover Cluster is a server layer redundancy solution which is provided by Microsoft. It is used to prevent a single server failure resulting in the service failures. It solved the problems that all network centers or data centers have to face which is keeping the services available even if servers go down. Sometimes servers may down accidentally, due to the power supply failure and the servers may also be turned off because of the periodic maintenance or update. It provides a cluster of servers to offer the same services. When one of the active servers goes down due to some problem, another one will automatically takes over its duty to supply the services continually. 'Examples' When only one server is used to provide services and there is no backup as shown in Figure.1. If the server goes down, the whole service is down. A failover cluster example is shown as Figure.1. Server1 and Server2 are formed as a single virtual server to supply services. Server1 is an active server and Server2 is the backup server. There is a heartbeat signal connecting between the servers to detect the server status. Clients are connecting to the Server10 (Virtual Server) via Ethernet. However, when Server1 suffers an abnormal termination, Server2 may not receive the heartbeat from Server1 for a predefined period (generally three times of the heartbeat periods). Then, Server2 may automatically take over and keep providing services for clients. Single Server.jpg|Figure.1. Single Server|linktext=Only one server is used to provide service. If the server goes down, the whole service is down.. Failover Cluster.jpg|Figure.2 Failover Cluster|linktext=Failover Cluster servers may continuing provide service even one of the server goes down. Routers Failover Hot Standby Router Protocol (HSRP) '''is a proprietary protocol which is defined by CISCO. It provides redundancy and failover for default gateway and ensures the network still available even if one of the routers goes down. Routers using HSRP are sharing the same virtual IP address and MAC address. And they are predefined with different priorities. The router with large priority will be the active router. When the active router goes down, its priority may also reduces. After the router recovery, they will compare the priorities again and decide who should be active. '''Virtual Router Redundancy Protocol (VRRP) is a public protocol which has a similar function as HSRP. 'Switches Failover' Spanning Tree Protocol (STP) 'is a protocol which is used to prevent broadcast radiation and bridge loops. However, it also provided network redundancy and fast failover to ensure the network access availablility. Initially, when all network devices are start up, switches will calculate and select a best path with no loops. root switch and a backup switch. Some of the ports may be blocked to ensure there is no loop in the network. When the network topology is changed (such as a switch shut down, some connections are fail), the network will calculate and determine a new tree to ensure the network still available. The backup links will be used automatically in these cases. 'Port-Channels Failover EtherChannel '''is also a proprietary technique of CISCO devices. It is designed to extend the trunk link, solve the bottleneck problem as well as implementing the load balance. EtherChannel is a virtual channel which can band up to eight port-channels in use and others for backup. When one or more of the in using ports are not available, the backup ports will automatically start up to ensure the bandwidth of the EtherChannel. '''References and Further Readings How to create a server failover solution Configuring Standard and Statefull Failover in LocalDirector Failover Failover Cluster Hot Standby Router Protocol Virtual Router Redundancy Protocol